The goal of this exploration is to develop a hierarchical Bayesian model to estimate the true case fatality rate (CFR) for each county. In particular, we will start by taking advantage of the grouping of counties within states. The result of the model will be a “denoised” estimate of the CFR for each county in the country.
The initial motivation for this exploration is to use the distribution of the denoised CFR across counties to estimate to select the shape and scale for a beta prior distribution that will enable the analytic calculation of a denoised posterior CFR for an arbitrary county taking advantage of the conjucacy between a beta prior and a binomial likelihood.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000000 0.008929 0.018868 0.024723 0.033333 0.312500
The most extreme CFRs come from counties with small numbers of cases
The appears to be meaningful clustering of CFR within states, which suggests that a model with a random effect for state is appropriate.
We fit a binomial model to estimate an adjusted (denoised) CFR for each county by shrinking the county CFR towards the state CFR and shrinking the state CFR towards the national CFR. The prior \(N(0,1.6)\) on the intercept is chosen because this prior on the logit scale is approximately uniform over [0,1] when transformed to the probability scale.
I should note that with these priors, this simple model probably does not require STAN to fit (we could use e.g. glmer). However the stan machinery will be needed if we make the model more complex.
We could in theory obtain more precise estimates by placing a more informative prior on the national CFR, however the gains in precision would likely be small given that there is ample data to estimate the national CFR.
## prior class coef group resp dpar nlpar bound
## 1 normal(0, 1.6) Intercept
## 2 normal(0, 1) sd
## 3 sd fips
## 4 sd Intercept fips
## 5 sd state
## 6 sd Intercept state
The variance estimates for the state random effect and the county random effect are roughly the same and are relatively large, suggesting that there is meaningful variation in CFR both within and between states.
## Family: binomial
## Links: mu = logit
## Formula: deaths | trials(cases) ~ (1 | state) + (1 | fips)
## Data: dat (Number of observations: 3129)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~fips (Number of levels: 3129)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.58 0.01 0.56 0.60 1.00 857 1766
##
## ~state (Number of levels: 51)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.54 0.06 0.44 0.69 1.00 815 1315
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -3.84 0.08 -3.98 -3.68 1.01 361 572
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The adjusted national CFR is essentially the same as the unadjusted national CFR, because there is ample data to estimate this CFR.
| cases | deaths | CFR | CFR_adj |
|---|---|---|---|
| 6712730 | 207124 | 0.03086 | 0.03086 |
The adjusted state CFRs are very close to the unadjusted state CFRs, because there is ample data to estimate these as well. The regularization on the state random effect may become more important if we do something more complex, such as incorporating a time trend that interacts with state.
The adjustment on the county-level CFR is much more meaningful. The many points substantially below the diagonal on this plot indicate counties where the unadjusted CFR is very large but the adjusted (denoised) CFR is much more moderate.
The relationship between CFR and number of cases looks different after adjustment, in that CFRs for counties with few cases have been shrunken to more moderate values. Interestingly, there is a slight positive relationship (approximately linear on the log scale) between CFR and cases, for both adjusted and unadjusted CFR.
Hover over densities to see which state they represent.
| shape1 | shape2 |
|---|---|
| 2.842 | 111.3 |
| state | shape1 | shape2 | n_counties | deaths | cases | mean_CFR | mean_CFR_fitted |
|---|---|---|---|---|---|---|---|
| DE | 76.26 | 2090 | 3 | 645 | 19086 | 0.0353 | 0.0352 |
| HI | 17.14 | 934.5 | 4 | 152 | 11375 | 0.01478 | 0.01802 |
| RI | 3.609 | 67.5 | 5 | 1092 | 21374 | 0.05063 | 0.05075 |
| CT | 4.854 | 60.55 | 8 | 4513 | 55406 | 0.07417 | 0.07421 |
| NH | 6.622 | 190.4 | 10 | 442 | 7941 | 0.02801 | 0.03361 |
| MA | 6.118 | 73.64 | 14 | 9502 | 133599 | 0.07335 | 0.07672 |
| VT | 12.35 | 485.6 | 14 | 58 | 1707 | 0.01698 | 0.02481 |
| AZ | 11.57 | 321.6 | 15 | 5705 | 214015 | 0.03479 | 0.03473 |
| ME | 3.187 | 86.42 | 16 | 142 | 5075 | 0.03423 | 0.03557 |
| NV | 19.16 | 1006 | 16 | 1620 | 75774 | 0.01301 | 0.01869 |
| NJ | 14.07 | 153.9 | 21 | 16135 | 199432 | 0.08393 | 0.08379 |
| WY | 7.769 | 607.5 | 23 | 53 | 4869 | 0.01402 | 0.01263 |
| MD | 6.709 | 195.8 | 24 | 3945 | 120156 | 0.03274 | 0.03313 |
| AK | 39.01 | 4271 | 26 | 58 | 6834 | 0.008358 | 0.00905 |
| UT | 4.641 | 519.9 | 29 | 475 | 63732 | 0.008107 | 0.008848 |
| NM | 5.46 | 194.2 | 33 | 890 | 26215 | 0.02757 | 0.02734 |
| OR | 11.84 | 704 | 35 | 563 | 30795 | 0.01314 | 0.01654 |
| WA | 5.079 | 254 | 39 | 2139 | 82248 | 0.01879 | 0.01961 |
| ID | 4.404 | 301.8 | 44 | 480 | 37488 | 0.01551 | 0.01438 |
| SC | 8.218 | 263.2 | 46 | 3442 | 137708 | 0.0305 | 0.03028 |
| ND | 10.15 | 634.2 | 53 | 271 | 17954 | 0.01765 | 0.01576 |
| MT | 9.251 | 527.9 | 54 | 186 | 10299 | 0.01492 | 0.01722 |
| WV | 4.799 | 181.5 | 55 | 357 | 14049 | 0.02512 | 0.02576 |
| CA | 4.763 | 269.9 | 59 | 16142 | 785501 | 0.01604 | 0.01734 |
| NY | 3.034 | 61.18 | 62 | 32773 | 449900 | 0.0466 | 0.04726 |
| CO | 6.718 | 260.1 | 63 | 2057 | 64827 | 0.02066 | 0.02517 |
| LA | 8.091 | 231.5 | 64 | 5387 | 160971 | 0.0335 | 0.03377 |
| SD | 9.98 | 657.5 | 66 | 248 | 18695 | 0.0143 | 0.01495 |
| AL | 3.672 | 172.2 | 67 | 2558 | 144960 | 0.02134 | 0.02088 |
| FL | 5.477 | 251.1 | 67 | 14628 | 682155 | 0.02108 | 0.02135 |
| PA | 4.752 | 111.4 | 67 | 8199 | 150327 | 0.03814 | 0.04092 |
| WI | 6.651 | 616.9 | 72 | 1372 | 107291 | 0.01089 | 0.01067 |
| AR | 4.291 | 183.5 | 75 | 1407 | 74032 | 0.02246 | 0.02286 |
| OK | 6.018 | 367.8 | 77 | 1051 | 76725 | 0.01623 | 0.0161 |
| MS | 8.065 | 218.8 | 82 | 3015 | 93361 | 0.03566 | 0.03555 |
| MI | 4.574 | 125.1 | 83 | 7043 | 122119 | 0.03468 | 0.03526 |
| MN | 3.641 | 220.9 | 87 | 2073 | 89806 | 0.01667 | 0.01622 |
| OH | 3.815 | 99.83 | 88 | 4925 | 144308 | 0.0367 | 0.03681 |
| NE | 4.214 | 255.6 | 91 | 497 | 40882 | 0.01713 | 0.01622 |
| IN | 3.641 | 108.5 | 92 | 3669 | 113914 | 0.03172 | 0.03247 |
| TN | 6.45 | 408.8 | 95 | 2525 | 177686 | 0.01552 | 0.01553 |
| IA | 3.998 | 214.6 | 99 | 1377 | 79995 | 0.01817 | 0.01829 |
| NC | 5.203 | 231.1 | 100 | 3629 | 193546 | 0.02215 | 0.02202 |
| IL | 4.9 | 207.6 | 102 | 8774 | 274198 | 0.02184 | 0.02306 |
| KS | 6.049 | 402 | 105 | 686 | 52684 | 0.01653 | 0.01483 |
| MO | 4.421 | 297.4 | 115 | 2170 | 112764 | 0.01404 | 0.01465 |
| KY | 4.551 | 217.3 | 120 | 1205 | 61515 | 0.01921 | 0.02051 |
| VA | 3.579 | 127.9 | 133 | 3270 | 140488 | 0.02714 | 0.02722 |
| GA | 4.192 | 129.4 | 159 | 6966 | 286660 | 0.03196 | 0.03138 |
| TX | 5.7 | 179 | 251 | 15984 | 701334 | 0.03382 | 0.03086 |
The number in parentheses after each state indicates the number of counties.
Adjusted CFRs from EB are generally lower than those from the model, where the two estimates differ. To understand how similar these two adjusted estimates are relative to the unadjusted CFR, we have to look at the unadjusted CFR as well.
In most cases, the two adjusted CFRs are similar (relative to the unadjusted CFR). The Empirical Bayes adjustment tends to shrink the CFR to slightly lower values than the model adjustment.
One possible explanation for differences between the two adjustment methods is that in the model, there is a single variance for the county random effects across states, while in the EB method, the fitted beta distributions have differing variances by state.
Assuming a true mortality rate of 0.0138. The distribution of estimated underreporting factors is much more heavy-tailed for the unadjusted CFRs compared to the adjusted CFRs.